21 research outputs found
Are there any âobject detectorsâ in the hidden layers of CNNs trained to identify objects or scenes?
Various methods of measuring unit selectivity have been developed with the
aim of better understanding how neural networks work. But the different
measures provide divergent estimates of selectivity, and this has led to
different conclusions regarding the conditions in which selective object
representations are learned and the functional relevance of these
representations. In an attempt to better characterize object selectivity, we
undertake a comparison of various selectivity measures on a large set of units
in AlexNet, including localist selectivity, precision, class-conditional mean
activity selectivity (CCMAS), network dissection,the human interpretation of
activation maximization (AM) images, and standard signal-detection measures. We
find that the different measures provide different estimates of object
selectivity, with precision and CCMAS measures providing misleadingly high
estimates. Indeed, the most selective units had a poor hit-rate or a high
false-alarm rate (or both) in object classification, making them poor object
detectors. We fail to find any units that are even remotely as selective as the
'grandmother cell' units reported in recurrent neural networks. In order to
generalize these results, we compared selectivity measures on units in VGG-16
and GoogLeNet trained on the ImageNet or Places-365 datasets that have been
described as 'object detectors'. Again, we find poor hit-rates and high
false-alarm rates for object classification. We conclude that signal-detection
measures provide a better assessment of single-unit selectivity compared to
common alternative approaches, and that deep convolutional networks of image
classification do not learn object detectors in their hidden layers.Comment: Published in Vision Research 2020, 19 pages, 8 figure
Does that sound right? A novel method of evaluating models of reading aloud
Nonword pronunciation is a critical challenge for models of reading aloud but little attention has been given to identifying the best method for assessing model predictions. The most typical approach involves comparing the modelâs pronunciations of nonwords to pronunciations of the same nonwords by human participants and deeming the modelâs output correct if it matches with any transcription of the human pronunciations. The present paper introduces a new ratings-based method, in which participants are shown printed nonwords and asked to rate the plausibility of the provided pronunciations, generated here by a speech synthesiser. We demonstrate this method with reference to a previously published database of 915 disyllabic nonwords (Mousikou et al., 2017). We evaluated two well-known psychological models, RC00 and CDP++, as well as an additional grapheme-to-phoneme algorithm known as Sequitur, and compared our model assessment with the corpus-based method adopted by Mousikou et al. We find that the ratings method: a) is much easier to implement than a corpus-based method, b) has a high hit rate and low false-alarm rate in assessing nonword reading accuracy, and c) provided a similar outcome as the corpus-based method in its assessment of RC00 and CDP++. However, the two methods differed in their evaluation of Sequitur, which performed much better under the ratings method. Indeed, our evaluation of Sequitur revealed that the corpus-based method introduced a number of false positives and more often, false negatives. Implications of these findings are discussed
Frequency Sensitivity of Neural Responses to English Verb Argument Structure Violations
How are verb-argument structure preferences acquired? Children typically receive very little negative evidence, raising the question of how they come to understand the restrictions on grammatical constructions. Statistical learning theories propose stochastic patterns in the input contain sufficient clues. For example, if a verb is very common, but never observed in transitive constructions, this would indicate that transitive usage of that verb is illegal. Ambridge et al. (2008) have shown that in offline grammaticality judgements of intransitive verbs used in transitive constructions, low-frequency verbs elicit higher acceptability ratings than high-frequency verbs, as predicted if relative frequency is a cue during statistical learning. Here, we investigate if the same pattern also emerges in on-line processing of English sentences. EEG was recorded while healthy adults listened to sentences featuring transitive uses of semantically matched verb pairs of differing frequencies. We replicate the finding of higher acceptabilities of transitive uses of low- vs. high-frequency intransitive verbs. Event-Related Potentials indicate a similar result: early electrophysiological signals distinguish between misuse of high- vs low-frequency verbs. This indicates online processing shows a similar sensitivity to frequency as off-line judgements, consistent with a parser that reflects an original acquisition of grammatical constructions via statistical cues. However, the nature of the observed neural responses was not of the expected, or an easily interpretable, form, motivating further work into neural correlates of online processing of syntactic constructions
The human visual system and CNNs can both support robust online translation tolerance following extreme displacements
Visual translation tolerance refers to our capacity to recognize objects over a wide range of different retinal locations. Although translation is perhaps the simplest spatial transform that the visual system needs to cope with, the extent to which the human visual system can identify objects at previously unseen locations is unclear, with some studies reporting near complete invariance over 10 degrees and other reporting zero invariance at 4 degrees of visual angle. Similarly, there is confusion regarding the extent of translation tolerance in computational models of vision, as well as the degree of match between human and model performance. Here, we report a series of eye-tracking studies (total N = 70) demonstrating that novel objects trained at one retinal location can be recognized at high accuracy rates following translations up to 18 degrees. We also show that standard deep convolutional neural networks (DCNNs) support our findings when pretrained to classify another set of stimuli across a range of locations, or when a global average pooling (GAP) layer is added to produce larger receptive fields. Our findings provide a strong constraint for theories of human vision and help explain inconsistent findings previously reported with convolutional neural networks (CNNs)
Clarifying status of DNNs as models of human vision
On several key issues we agree with the commentators. Perhaps most importantly, everyone seems to agree that psychology has an important role to play in building better models of human vision, and (most) everyone agrees (including us) that deep neural networks (DNNs) will play an important role in modelling human vision going forward. But there are also disagreements about what models are for, how DNN-human correspondences should be evaluated, the value of alternative modelling approaches, and impact of marketing hype in the literature. In our view, these latter issues are contributing to many unjustified claims regarding DNN-human correspondences in vision and other domains of cognition. We explore all these issues in this response
Children Use Statistics and Semantics in the Retreat from Overgeneralization
How do children learn to restrict their productivity and avoid ungrammatical utterances? The present study addresses this question by examining why some verbs are used with un- prefixation (e.g., unwrap) and others are not (e.g., *unsqueeze). Experiment 1 used a priming methodology to examine children's (3â4; 5â6) grammatical restrictions on verbal un- prefixation. To elicit production of un-prefixed verbs, test trials were preceded by a prime sentence, which described reversal actions with grammatical un- prefixed verbs (e.g., Marge folded her arms and then she unfolded them). Children then completed target sentences by describing cartoon reversal actions corresponding to (potentially) un- prefixed verbs. The younger age-group's production probability of verbs in un- form was negatively related to the frequency of the target verb in bare form (e.g., squeez/e/ed/es/ing), while the production probability of verbs in un- form for both age groups was negatively predicted by the frequency of synonyms to a verb's un- form (e.g., release/*unsqueeze). In Experiment 2, the same children rated the grammaticality of all verbs in un- form. The older age-group's grammaticality judgments were (a) positively predicted by the extent to which each verb was semantically consistent with a semantic âcryptotypeâ of meanings - where âcryptotypeâ refers to a covert category of overlapping, probabilistic meanings that are difficult to access - hypothesised to be shared by verbs which take un-, and (b) negatively predicted by the frequency of synonyms to a verb's un- form. Taken together, these experiments demonstrate that children as young as 4;0 employ pre-emption and entrenchment to restrict generalizations, and that use of a semantic cryptotype to guide judgments of overgeneralizations is also evident by age 6;0. Thus, even early developmental accounts of children's restriction of productivity must encompass a mechanism in which a verb's semantic and statistical properties interact
Quantifying sources of variability in infancy research using the infant-directed-speech preference
Psychological scientists have become increasingly concerned with issues related to methodology and replicability, and infancy researchers in particular face specific challenges related to replicability: For example, high-powered studies are difficult to conduct, testing conditions vary across labs, and different labs have access to different infant populations.
Addressing these concerns, we report on a large-scale, multisite study aimed at (a) assessing the overall replicability of a single theoretically important phenomenon and (b) examining methodological, cultural, and developmental
moderators. We focus on infantsâ preference for infant-directed speech (IDS) over adult-directed speech (ADS). Stimuli of mothers speaking to their infants and to an adult in North American English were created using seminaturalistic
laboratory-based audio recordings. Infantsâ relative preference for IDS and ADS was assessed across 67 laboratories in North America, Europe, Australia, and Asia using the three common methods for measuring infantsâ discrimination
(head-turn preference, central fixation, and eye tracking). The overall meta-analytic effect size (Cohenâs d) was 0.35, 95% confidence interval = [0.29, 0.42], which was reliably above zero but smaller than the meta-analytic mean computed from previous literature (0.67). The IDS preference was significantly stronger in older children, in those children for whom the stimuli matched their native language and dialect, and in data from labs using the head-turn preference procedure. Together, these findings replicate the IDS preference but suggest that its magnitude is modulated by development, native-language experience, and testing procedure. (This project has received funding from the European Unionâs Horizon 2020 research and innovation programme under the Marie SkĆodowska-Curie grant agreement No 798658.